Bulk RNA-seq analysis of mice model melanoma

Material and methods

RNA expraction and sequencing

Total RNA was isolated using the RNeasy Plant Mini Kit (QIAGEN, Germany) following the manufacturer’s protocol. To eliminate potential DNA contamination, the RNA was further treated with the TURBO DNA-free Kit (Thermo Fisher Scientific, USA) in a 50 µL reaction volume. Purification was performed using Agencourt RNA Clean XP beads (Beckman Coulter, USA). RNA concentration and integrity were evaluated using the Quantitative RiboGreen RNA Assay (Thermo Fisher Scientific) and the RNA 6000 Pico Kit (Agilent Technologies, USA). RNA libraries were constructed using the NEBNext Poly(A) mRNA Magnetic Isolation Module followed by the KAPA RNA Hyper Kit (Roche, Switzerland), adhering to the manufacturer’s instructions. Final library purification was carried out with KAPA HyperPure Beads (Roche, Switzerland). Library fragment size distribution and purity were assessed using the High Sensitivity DNA Kit (Agilent Technologies). Quantification was performed with the Quant-iT DNA Assay Kit, High Sensitivity (Thermo Fisher Scientific). Equimolar pools of each library (10 pM) were subjected to high-throughput sequencing on the Illumina HiSeq 2500 platform using paired-end reads (2 × 100 bp) with a 2% PhiX spike-in control.

Bioinformatics analysis

Initial quality assessment of murine transcriptomic data was performed using FastQC (v0.12.1), with aggregated reports generated via MultiQC (v1.27.1) [Ewels et al., 2016]. Raw reads were processed using fastp (v0.23.4) to remove low-quality sequences [Chen et al., 2018]. Read alignment was subsequently conducted with STAR (v2.7.11) [Dobin et al., 2013] against the Mus musculus reference genome (GENCODE GRCm39.vM36) using the annotation file gencode.vM36.chr_patch_hapl_scaff.annotation.gtf, followed by gene-level quantification with HTSeq-count (v2.0.5) [Anders et al., 2015]. Differential gene expression analysis between experimental groups was performed using DESeq2 (v1.44.0) [Love et al., 2014]. For comprehensive functional interpretation, we implemented gene-set enrichment analysis (GSEA) through fgsea (v1.31.6) [Korotkevich et al., 2016] and clusterProfiler (v4.12.6) [Wu et al., 2021], utilizing both the MSigDB [Castanza et al., 2023] and CellMarker 2.0 [Hu et al., 2023] databases. Immune cell enrichment patterns were further assessed via Gene Set Variation Analysis (GSVA) [Hänzelmann et al., 2013]. Immune cell populations were deconvoluted from tumor transcriptome profiles using the mMCP-counter algorithm [Petitprez et al., 2020]. Co-expression network analysis was performed using BioNERO (v1.12.0) [Almeida-Silva et al., 2022], employing signed hybrid network topology with Pearson correlation metrics. Potential batch effects were addressed using the removeBatchEffect function from the limma package [Ritchie et al., 2015]. Functional annotation of co-expressed gene modules included over-representation analysis against multiple pathway databases including MSigDB, KEGG [Kanehisa et al., 2025], Reactome [Milacic et al., 2024], and WikiPathways [Agrawal et al., 2024]. Protein-protein interaction networks were reconstructed using the STRING database v12.0 through the STRINGdb (v2.16.4) Bioconductor package [Szklarczyk et al., 2023]. Visualization of analytical results was achieved using a combination of pheatmap (v1.0.12) for heatmap generation, ggplot2 (v3.5.1) for general plotting, and EnhancedVolcano (v1.22.0) for specialized visualization of differential expression results. The entire analytical workflow was implemented in R (v4.4.2).

flowchart LR
    A[("Raw reads")] --> B{"fastp"}
    B --> C{"FastQC"} & E{"STAR"}
    C --> D{"MultiQC"}
    E --> F{"HTSeq"}
    F --> G[("Count table")]
    G --> I{"DESeq2"} & Z{"limma"} & W{"PCA"}
    Z --> H{"PCA"} & J{"BioNERO"}
    I --> K{"GSEA"}
    J --> L{"ORA"} & N{"STRING"}
    G --> V{"GSVA"}

     A:::Aqua
     B:::Aqua
     B:::Pine
     C:::Pine
     E:::Pine
     D:::Pine
     F:::Pine
     G:::Aqua
     I:::Pine
     Z:::Pine
     W:::Pine
     H:::Pine
     J:::Pine
     K:::Pine
     L:::Pine
     N:::Pine
     V:::Pine
    classDef Aqua stroke-width:1px, stroke-dasharray:none, stroke:#46EDC8, fill:#DEFFF8, color:#378E7A
    classDef Pine stroke-width:1px, stroke-dasharray:none, stroke:#254336, fill:#27654A, color:#FFFFFF

References

  1. Ewels, Philip, et al. “MultiQC: summarize analysis results for multiple tools and samples in a single report.” Bioinformatics 32.19 (2016): 3047-3048.

  2. Chen, Shifu, et al. “fastp: an ultra-fast all-in-one FASTQ preprocessor.” Bioinformatics 34.17 (2018): i884-i890.

  3. Dobin, Alexander, et al. “STAR: ultrafast universal RNA-seq aligner.” Bioinformatics 29.1 (2013): 15-21.

  4. Anders, Simon, Paul Theodor Pyl, and Wolfgang Huber. “HTSeq—a Python framework to work with high-throughput sequencing data.” Bioinformatics 31.2 (2015): 166-169

  5. Love, Michael I., Wolfgang Huber, and Simon Anders. “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome biology 15 (2014): 1-21.

  6. Hu, Congxue, et al. “CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data.” Nucleic acids research 51.D1 (2023): D870-D876.

  7. Hänzelmann, Sonja, Robert Castelo, and Justin Guinney. “GSVA: gene set variation analysis for microarray and RNA-seq data.” BMC bioinformatics 14 (2013): 1-15.

  8. Petitprez, Florent, et al. “The murine Microenvironment Cell Population counter method to estimate abundance of tissue-infiltrating immune and stromal cell populations in murine samples using gene expression.” Genome medicine 12 (2020): 1-15.

  9. Almeida-Silva, Fabricio, and Thiago M. Venancio. “BioNERO: an all-in-one R/Bioconductor package for comprehensive and easy biological network reconstruction.” Functional & Integrative Genomics 22.1 (2022): 131-136.

  10. Ritchie, Matthew E., et al. “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic acids research 43.7 (2015): e47-e47.

  11. Kanehisa, Minoru, et al. “KEGG: biological systems database as a model of the real world.” Nucleic Acids Research 53.D1 (2025): D672-D677.

  12. Milacic, Marija, et al. “The reactome pathway knowledgebase 2024.” Nucleic acids research 52.D1 (2024): D672-D678.

  13. Agrawal, Ayushi, et al. “WikiPathways 2024: next generation pathway database.” Nucleic acids research 52.D1 (2024): D679-D689.

  14. Szklarczyk, Damian, et al. “The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest.” Nucleic acids research 51.D1 (2023): D638-D646.

DNA extraction and sequencing

RNeasy Plant Mini Kit (QIAGEN, German)

TURBO DNA-free kit (Thermo Fisher Scientific, USA)

Agencourt RNA Clean XP (Beckman Coulter, Brea, USA)

Quantitative RiboGreen RNA assay (Thermo Fisher Scientific)

RNA 6000 Pico Kit (Agilent Technologies, Santa Clara, CA, USA)

NEBNext Poly(A) mRNA Magnetic Isolation Module

KAPA RNA Hyper Kit (Roche, Switzerland)

KAPPA HyperPure beads (Roche, Switzerland)

Highly Sensitive DNA Kit (Agilent Technologies)

Quant-iT DNA Assay Kit, High Sensitivity (Thermo Fisher Scientific)

Illumina HiSeq 2500

Databases

GENCODE: https://www.gencodegenes.org/mouse/

MSigDB: https://www.gsea-msigdb.org/gsea/msigdb

KEGG: https://www.genome.jp/kegg/

Reactome: https://reactome.org/

Wiki pathways: https://www.wikipathways.org/

CellMarker 2.0: http://www.bio-bigdata.center/

STRING: https://string-db.org/

Tools

FastQC: https://github.com/s-andrews/FastQC

MultiQC: https://github.com/MultiQC/MultiQC

fastp: https://github.com/OpenGene/fastp

STAR: https://github.com/alexdobin/STAR

HTSeq: https://github.com/simon-anders/htseq

DESeq2: https://github.com/thelovelab/DESeq2

fgsea: https://github.com/alserglab/fgsea

clusterProfiler: https://guangchuangyu.github.io/software/clusterProfiler/

GSVA: https://bioconductor.org/packages/release/bioc/html/GSVA.html

mMCP-counter: https://github.com/cit-bioinfo/mMCP-counter

biomaRt: https://github.com/grimbough/biomaRt

BioNERO: https://github.com/almeidasilvaf/BioNERO

limma: https://bioconductor.org/packages/release/bioc/html/limma.html

STRINGdb: https://www.bioconductor.org/packages/release/bioc/html/STRINGdb.html

pheatmap: https://github.com/raivokolde/pheatmap

ggplot2: https://github.com/tidyverse/ggplot2

EnhancedVolcano: https://github.com/kevinblighe/EnhancedVolcano

R: https://www.r-project.org/

Raw data

PRJNA1214537: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1214537/

Results

Experimental Design and Sample Overview

The analysis included a total of 14 transcriptomic samples representing three experimental groups: 1) melanoma (n=6); 2) melanoma supplemented with Bifidobacterium adolescentis 150 (n=6); and 3) melanoma supplemented with Lacticaseibacillus rhamnosus K32 (n=2) (Table 1). The study was conducted in two sequential replicates: the first replicate comprised only the melanoma and B. adolescentis-supplemented groups (n=8), while the second replicate incorporated all three experimental conditions (n=6). The genomes of the bacteria used in this experiment are available at the following links GCF_001010915.1 and GCF_000735255.1. For clarity in data interpretation, we introduced the following standardized codings:

Experimental groups:

  1. Melanoma –> M

  2. Melanoma with B. adolescentis supplement –> M_BIF

  3. Melanoma with L. rhamnosus supplement –> M_LAC

Experimental batch:

  1. First experiment –> batch_1

  2. Second experiment –> batch_2

Quality control

Following quality filtering with fastp, aggregated quality metrics were compiled using MultiQC to summarize the FastQC reports. As illustrated in Figure 1, the MultiQC quality profile indicates that the sequencing data maintained adequate overall quality. Post-filtering, approximately 198 million reads (mean ± SD: 14 ± 5 mln reads per sample) were retained for downstream analysis, with an average of 16 ± 7% of reads per sample removed during quality control (see Table 2 for detailed filtering statistics). Transcript abundance was quantified using the HTSeq tool. Figure 3 displays the distribution of HTSeq counts across all experimental samples. While no statistically significant differences in read counts were observed between experimental groups, batch effects significantly influenced read distribution (Figure 3). The resulting gene expression count matrix was normalized using the median of ratios method, with normalized values presented in Table 4.

MiltiQC report

MultiQC full report

Figure 1. Summary MultiQC report.

Reads filtering by quality statistic

Mapping reads to reference counts

Saving 5 x 7 in image

Figure 3. Barplot denoted HTSeq counts per experimantal sample.

ANOVA

            Df Sum Sq Mean Sq F value   Pr(>F)    
batch        1  851.8   851.8 210.230 4.84e-08 ***
group        2    0.1     0.0   0.006    0.994    
Residuals   10   40.5     4.1                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Normalized counts table

Principal component analysis

Principal component analysis (PCA) was employed as a linear dimensional reduction technique for exploratory data analysis and visualization. Prior to PCA, the expression data underwent variance-stabilizing transformation (VST), which derives stable variance estimates from fitted dispersion-mean relationships. This transformation processes count data normalized by size factors, generating a matrix of approximately homeostatic values characterized by constant variance across the range of mean expression levels, while simultaneously accounting for library size differences. The transformed data were visualized through bi-dimensional projection of the first two principal components (Figure 4), with the proportion of explained variance for each component detailed in Figure 5. Permutation multivariate analysis of variance (PERMANOVA) revealed statistically significant associations between expression profiles and batch effects, while no significant relationships with experimental groups were detected.

Saving 5 x 4 in image

Figure 4. PCA visualization using regularized log transformation counts.
Saving 5 x 4 in image

Figure 5. The proportion of explained variance of each component.
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 9999

adonis2(formula = t(vsd) ~ batch + group, data = meta_data, permutations = 9999, method = "euclidean", by = "margin")
         Df SumOfSqs      R2      F Pr(>F)   
batch     1     9421 0.19069 3.7386 0.0086 **
group     2     7784 0.15756 1.5445 0.1325   
Residual 10    25199 0.51007                 
Total    13    49404 1.00000                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Differential expression analysis

Differential expression analysis was conducted using the DESeq2 package, employing the model formula ~ batch + group to account for batch-related dispersion effects. The analysis revealed positive log2 fold change (lfc) values associated with both B. adolescentis 150 and L. rhamnosus K32 supplementation groups, while negative values corresponded to the untreated melanoma control group. DESeq2 performs differential expression analysis by fitting generalized linear models to negative binomial distributed counts and evaluating linear contrasts between experimental groups. Our results demonstrated 3 significantly up-regulated and 16 down-regulated genes in the M_BIF group compared to the M control group. For the M_LAC comparison, 21 differentially expressed genes were identified relative to the M group, while no significant changes were observed in the reverse comparison. When directly comparing bacterial supplementation groups, 39 genes were associated with M_BIF and 116 with M_LAC. Differentially expressed genes were determined using stringent thresholds (adjusted p-value < 0.05, |lfc| > 1). Complete statistical results of the differential analysis are presented in Tables 4-6. Visual representation of the results through volcano plots (Figures 6-8) effectively illustrates the relationship between statistical significance (-log10 p-value) and magnitude of expression changes (lfc) across all comparisons.

M_BIF vs M

Figure 6. Volcano plot denoted differential expressed genes: M_BIF vs M.

M_LAC vs M

Figure 7. Volcano plot denoted differential expressed genes: M_LAC vs M.

M_BIF vs M_LAC

Figure 8. Volcano plot denoted differential expressed genes: M_BIF vs M_LAC.

Gene set enrichment analysis

Gene Set Enrichment Analysis (GSEA) was employed to evaluate whether predefined gene sets exhibited statistically significant and concordant differences between biological states (e.g., phenotypic groups). This computational approach was implemented using multiple databases, enabling comprehensive identification of biological functions associated with the experimental conditions. For this analysis, we specifically utilized the HALLMARK MSigDB gene sets. The complete GSEA results are presented in Figure 9 and Tables 7-9. To delineate cell-type-specific expression patterns, we performed marker analysis using the Mouse Cell Marker 2.0 database, with results visualized in Figure 10 and detailed in Tables 10-12. Complementary to this, Gene Set Variation Analysis (GSVA) was conducted to examine HALLMARK MSigDB gene sets in a sample-oriented manner (Figures 11-16). The investigation was extended to immune cell profiling through two complementary approaches: mMCP-counter was implemented for broad immune cell type quantification (Figure 17), whileGSVA was specifically applied to assess critical immune parameters including M1/M2 macrophage polarization ratios and CD8+ T cell abundance (Figures 18-21).

MSigDb Hallmark

M_BIF vs M
M_LAC vs M
M_BIF vs M_LAC

Figure 9

Figure 9. Heatmap denoted NES values of MSigDB Hallmark pathways linked to experimental groups.

Cell Marker 2.0

M_BIF vs M
M_LAC vs M
M_BIF vs M_LAC

Figure 10

Figure 10. Heatmap denoted NES values of immune cell types linked to experimental groups.

Specific HALLMARK pathways levels

Saving 3 x 4 in image

Figure 11. GSVA predicted counts of HALLMARK_INTERFERON_GAMMA_RESPONSE pathway in experimental groups.
            Df Sum Sq Mean Sq F value Pr(>F)  
group        2  85.33   42.67   4.839 0.0339 *
batch        1  54.00   54.00   6.125 0.0328 *
Residuals   10  88.17    8.82                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Saving 3 x 4 in image

Figure 12. GSVA predicted counts of HALLMARK_INTERFERON_ALPHA_RESPONSE pathway in experimental groups.
            Df Sum Sq Mean Sq F value Pr(>F)  
group        2 100.33   50.17   5.595 0.0234 *
batch        1  37.50   37.50   4.182 0.0681 .
Residuals   10  89.67    8.97                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Saving 3 x 4 in image

Figure 13. GSVA predicted counts of HALLMARK_TNFA_SIGNALING_VIA_NFKB pathway in experimental groups.
            Df Sum Sq Mean Sq F value Pr(>F)
group        2  74.67   37.33   2.742  0.112
batch        1  16.67   16.67   1.224  0.294
Residuals   10 136.17   13.62               
Saving 3 x 4 in image

Figure 14. GSVA predicted counts of HALLMARK_INFLAMMATORY_RESPONSE pathway in experimental groups.
            Df Sum Sq Mean Sq F value Pr(>F)  
group        2  74.67   37.33   3.107 0.0892 .
batch        1  32.67   32.67   2.718 0.1302  
Residuals   10 120.17   12.02                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Saving 3 x 4 in image

Figure 15. GSVA predicted counts of HALLMARK_TGF_BETA_SIGNALING pathway in experimental groups.
            Df Sum Sq Mean Sq F value  Pr(>F)   
group        2 129.00   64.50  12.141 0.00211 **
batch        1  45.38   45.38   8.541 0.01523 * 
Residuals   10  53.13    5.31                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Saving 3 x 4 in image

Figure 16. GSVA predicted counts of HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION pathway in experimental groups.
            Df Sum Sq Mean Sq F value Pr(>F)  
group        2 106.33   53.17   5.472 0.0248 *
batch        1  24.00   24.00   2.470 0.1471  
Residuals   10  97.17    9.72                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Immune cell types deconvolution

mMCP-counter
Saving 3 x 4 in image

Figure 17. Mmcp counter predicted T cell values in experimental groups.
            Df Sum Sq Mean Sq F value   Pr(>F)    
group        2 149.33   74.67   30.90 5.24e-05 ***
batch        1  54.00   54.00   22.34 0.000808 ***
Residuals   10  24.17    2.42                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

GSVA

Saving 3 x 4 in image

Figure 18. GSVA predicted counts of M1 macrophage in experimental groups.
            Df Sum Sq Mean Sq F value Pr(>F)  
group        2  80.67   40.33   4.211 0.0471 *
batch        1  51.04   51.04   5.328 0.0436 *
Residuals   10  95.79    9.58                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Saving 3 x 4 in image

Figure 19. GSVA predicted counts of M2 macrophage in experimental groups.
            Df Sum Sq Mean Sq F value Pr(>F)
group        2  57.33  28.667   1.688  0.233
batch        1   0.38   0.375   0.022  0.885
Residuals   10 169.79  16.979               
Saving 3 x 4 in image

Figure 20. GSVA predicted counts of M1/M2 macrophage ratio in experimental groups.
            Df Sum Sq Mean Sq F value Pr(>F)  
group        2  47.33   23.67   2.219 0.1594  
batch        1  73.50   73.50   6.891 0.0254 *
Residuals   10 106.67   10.67                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Saving 3 x 4 in image

Figure 21. GSVA predicted counts of CD8+ T cells in experimental groups.
            Df Sum Sq Mean Sq F value Pr(>F)  
group        2  37.33   18.67     1.6 0.2495  
batch        1  73.50   73.50     6.3 0.0309 *
Residuals   10 116.67   11.67                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Co-expression analysis

Gene co-expression analysis was performed to identify functionally related gene clusters and investigate their potential associations with phenotypic traits. Prior to analysis, technical batch effects were removed from the expression matrix using the removeBatchEffect function implemented in the limma package. PCA of the batch-corrected data showed similar sample clustering patterns to the uncorrected dataset (Figures 22 and 23), with PERMANOVA confirming the effective removal of batch effects while revealing no significant associations with experimental groups. The BioNERO package identified several co-expression modules showing differential associations with experimental conditions. Notably, the greenyellow and sienna3 modules demonstrated positive correlations with the untreated melanoma group, while the pink and royalblue modules showed negative associations (Figures 24-27, Tables 13 and 14). Functional characterization through over-representation analysis revealed significant biological pathways enriched in these modules (Tables 15 and 16). Hub gene analysis identified key regulatory elements within each module (Tables 17 and 18), with protein-protein interaction networks visualized for selected modules (Figure 28). Additional network characterization using STRING database highlighted potential functional relationships among module genes (Figures 29 and 30).

Batch correction

Saving 5 x 4 in image

Figure 22. PCA visualization using vst transformation batch corrected counts.
Saving 5 x 4 in image

Figure 23. The proportion of explained variance of each component.
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 9999

adonis2(formula = t(mat) ~ batch + group, data = meta_data, permutations = 9999, by = "margin")
         Df  SumOfSqs      R2      F Pr(>F)
batch     1 0.0001345 0.03397 0.4480 0.9468
group     2 0.0009222 0.23285 1.5355 0.1214
Residual 10 0.0030029 0.75823              
Total    13 0.0039604 1.00000              

Net construction

Saving 8 x 5 in image

Figure 24. Dengrogramm of genes and modules.
Saving 8 x 6 in image

Figure 25. Number of genes per module.

Figure 26

Co-expressed modules linked to M group

Figure 26. Co-expression module association with experimental groups.
Saving 6 x 12 in image

Figure 27. Expression profiles of modules linked to experimental groups.

Over-representation analysis

Hub genes analysis and visualization

Figure 28. Corelation network of co-expressed genes modules positively and negatively linked to M group.

STRING analysis of modules

Figure 29

Figure 29 - STRING website

Figure 29. STRING database network plot obtained using DEG positively linked to M group.

Figure 30

Figure 30 - STRING website

Figure 30. STRING database network plot obtained using DEG negatively linked to M group.